THE BLOCK CIPHER NSABC (PUBLIC DOMAIN) 



ALICE NGUYENOVA-STEPANIKOVA (*) AND TRAN NGOC DUONG (**) 

Abstract. We introduce NSABC/«) - Nice-Structured Algebraic Block Ci- 
pher using w-hit word arithmetic, a 4«)-bit analogous of Skipjack INSA98| with 
5tu-bit key. The Skipjack's internal 4-round Feistel structure is replaced with 
a w-hit, 2-round cascade of a binary operation (x, 2) i— ^ (x □ 2) {w/2) that 
permutes a text word x under control of a key word z. The operation sim- 
ilarly to the multiplication in IDEA ILM91I iLMMQll . bases on an algebraic 
group over w-h\t words, so it is also capable of decrypting by means of the 
inverse element of z in the group. The cipher utilizes a secret 4u;-bit tweak 
— an easily changeable parameter with unique value for each block encrypted 
under the same key |LRW02l - that is derived from the block index and an 
additional 4!i!-bit key. A software implementation for lii = 64 takes circa 9 
clock cycles per byte on x86-64 processors. 



1. Introduction 

In the today's world full of crypto algorithms, one may wonder what makes a 
block cipher attractive. 

In the authors' opinion, the answer to the question is one word: elegance. If 
something looks nice, then there is a big chance that it is also good. 

An elegant specification makes it easier to memorize. Memorability makes it 
easier to realize and to analyze, that allows for fruitful cryptanalytic results, leading 
to deeper understanding which, in turn, makes greater confidence in the algorithm. 

The elegance comprises the following features: 

• Few algebraic operations. Using of many operations results in hardly- 
tractable and possibly undesirable interactions between them. 

• Simple and regular key schedule. A complex key schedule, which effectively 
adds another, unrelated, function to the cipher, results in hardly-tractable 
and possibly undesirable interactions between the functions. 

IDEA, a secure block cipher designed by Xuejia Lai and James L. Massey |LM911 
ILMM9lj is an example of elegance. Besides being elegant with an efficient choice 
and arrangement of algebraic operations, it is elegant for some more features: 

• The use of incompatible group operations, where incompatible means there 
are no simple relations (such as distributivity) between them. The in- 
compatibility eliminates any exploitable algebraic property thus makes it 
infeasible to solve the cipher algebraically. 

• The use of modular multiplication. Multiplication produces huge mathe- 
matical complexity while consuming few clock cycles on modern processors. 
It thus greatly contributes to security and efficiency of the cipher. 
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However, IDEA uses multiplication modulo the Fermat prime 2™ + 1 which does 
not exist for w ~ 32 or w — 64, making it not extendable to machine word lengths 
nowadays. Furthermore, its key schedule is rather irregular due to the rotation of 
the primary key. 

Skipjack, a secure block cipher designed by the U.S. National Security Agency 
|NSA98| . is another example of elegant design. Besides being elegant with an 
efficient, simple and regular key schedule, it is elegant for one more feature: the use 
of two ciphers — an outer cipher, or wrapper, consisting of first and last rounds, 
and an inner cipher, or core, consisting of middle rounds. 

The terms "core" and "wrapper" were introduced in the design rationale of a struc- 
tural analogous of Skipjack: the block cipher MARS |IBM98| . MARS's designers 
justify this two-layer structure by writing that it breaks any repetitious property, 
it makes any iterative characteristic impossible, and it disallows any propagation of 
eventual vulnerabilities in either layer to the other one, thus making attacks more 
difficult. The wrapper is primarily aimed at fast diffusion and the core primarily 
at strong confusion. As Claude E. Shannon termed in his pioneer work ^Sha49| . 
diffusion here refers to the process of letting each input bit affect many output bits 
(or, equivalently. each output bit be affected by many input bits), and confusion 
here refers to the process of letting that affection very involved, possibly by doing 
it multiple times in very different ways. If a cipher is seen as a polynomial map 
in the plaintext and the key to the ciphertext, then the methods of diffusion and 
confusion can be described as the effort of making the polynomials as complete as 
possible, i.e. such that they contain virtually all terms at all degrees. This algebraic 
approach is very evident in the structure of Skipjack (see Figure [5TT|) . Skipjack (as 
opposed to MARS) was moreover sought elegant as the wrapper there is, in essence, 
the inverse function of the core. 

However, Skipjack uses an S-box that renders it rather slow, hard to program in 
a secure and efficient manner, and not extendable to large machine word lengths, 
as such. 

This article describes an attempt to combine the elegant idea of using incompat- 
ible and complex machine-oriented algebraic operations in IDEA with the elegant 
structure of Skipjack into a scalable and tweakable block cipher called NSABC — 
Nice-Structured Algebraic Block Cipher. 

NSABC is scalable. It is defined for every even word length w. It encrypts a 
4w-bit text block under a 5w-bit key, thus allows scaling up with 8-bit increment 
in block length and 10-bit increment in key length. 

NSABC is tweakable. It can use an easily changeable 4w-bit parameter, called 
tweak [LRW02| . to make a unique version of the cipher for every block encrypted 
under the same key. Included in the specification is a formula for changing the 
tweak. 

NSABC makes use of entirely the overall structure of Skipjack, including the 
key schedule, and only replaces the internal 4-round Feistel structure of Skipjack 
with another structure. The new structure consists of two rounds of the binary 
operation (x, z) t-^ (xEi z) <^ (w/2), that encrypts a text word x using a key word 

e 

z and a key-dependent word e. The operation □ is derived from an algebraic group 

e 

over w-bit words taking e as the unit element, so it is also capable of decrypting by 
means of the inverse element of z in the group. The two rounds are separated by 
an exclusive-or (XOR) operation that modifies the current text word by a tweak 
word. 

NSABC is put in public domain. As it bases on Skipjack, eventual users should 
be aware of patent(s) that may be possibly held by the U.S. Government and take 
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steps to make sure the use is free of legal issues. We (the designers of NSABC) are 
not aware of any patent related to other parts of the design. 

The rest of the article is organized as follows. Section 2 defines operations 
and notations. Section 3 specifies the cipher. Section 5 gives numerical examples. 
Section 4 suggests some implementation techniques. Section 6 concludes the article. 
Source code of software implementations are given in the Appendices. 



2.1. Operations on words. Throughout this article, w denotes the machine word 
length. We use the symbols ffl, B, Kl and (.)~^ to denote addition, subtraction 
(and arithmetic negation), multiplication and multiplicative inversion, respectively, 
modulo 2™ (unless otherwise said) . We use the symbols -i and ® to denote bit- wise 
complement and exclusive-or (XOR) on w-bit operands (unless otherwise said). 
We write x <^ n to denote leftward rotation (i. e. cyclic shift toward the most 
significant bit) of x, that is always a w-hit word, by n bits. For even w, the symbol 
(.)^ denotes swapping the high and low order halves, i.e. x^ = x <^ (w/2). 
Let's define binary operation by 

X Q y = 2xy S xSiy 

and binary operation □ by 

xEly = 2xy W xBy 

The bivariate polynomials on the right hand side are permutation polynomials 
in either variable for every fixed value of the other variable [Riv99]. In other words, 
and □ are quasi-group operations. 

Furthermore, is a group operation over the set of w-bit number^. This fact be- 
comes obvious by considering an alternative definition for the operation |Mey97| : 
it can be done by dropping the rightmost bit, which is always "1", of the product 
modulo 2™+^ of the operands each appended with an "1" bit. Symbolically, 



The group defined by is thus isomorphic to the multiplicative group of odd 
integers modulo 2™+^, via the isomorphism 



The unit (i.e., identity) element of the group is 0. The inverse element of x, 
denoted x, is 



2. Definitions 



a; y = [(2a; + l){2y + 1) - 1] /2 (mod 2"") 



X ^ 2x + 1 



X = 



Bx{2xSl) 



-1 



The following relations are obvious 



xQy 



B [{Bx) □ y 



xBy = B [{Bx) y] 
Since the unary operator B is an involution, the following relations hold. 



{xBy)B z = xB {y Q z) 



xBO = 



This fact, although simple and straightforward, does not seem to have been mentioned in the 
literature. 
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Notice that is the right unit element w.r.t. the operation □. Hence 

(a; □ y) □ y = a; 

which means that y is also the right inverse c^k^rnc^nt of y w. r. t. the □ operation. 
Since {-^x) ffl x = Bl holds for every x, the following relations hold. 

(r^x) Q,y = ^{xQy) 

(1 B a;) □ y = 1 B (a; □ y) 
Let e be a fixed w-bit number. Let's define binary operations and □ by 

e e 

X © y = (x B e) © (y B e) ffl e = 2xy ffl (1 - 2e)(x + y - e) 

e 

X B y = (x ffl e) B (y B e) B e = 2xy ffl (f - 2e)(x - y + e) 

e 

Then © and B are quasi-group operations over the set of w-bit numbers. This 

e e 

follows from a more general fact that the right-hand side trivariate polynomials are 

permutations in cither variable while keeping the other two fixed [Riv99] . Actually, 
the symbols and □ each defines an entire family of binary operations, of which 

e e 

each is uniquely determined by e. 

Furthermore, from the definition it immediately follows that is a group oper- 

e 

ation, namely, the group is isomorphic to one defined by via the isomorphism 

X X B e 

The unit element of the group is e. 

The inverse element of x in the group, denoted f , is 



= X B e ffl e = [(2e - l)x B 2e(e - 1)] K [2(x - e) ffl 1]"^ 



Simple calculation proves the following relations. 



X y = B (Bx) □ y 



xHy = B 



(Bx) © y 



(1 B 2e B x) B y = 1 B 2e B (x B y) 



{xE\y)E\z = xE\{y(D z) 



(x B y) B — = X 

e ' e y 

Notice that e is also the right unit element w. r. t. B, and | also the right 
inverse element of y w. r. t. □. The operation B, which is non-commutative 

e e 

and non-associative, will be used for encryption and, due to the existence of right 
inversion, also for decryption. 
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2.2. Order notations. We write multi-part data values in string (or number) 
notation or tuple (or vector) notation. In string notation, the value is written as a 

sequence of symbols, possibly separated by space(s) that are insignificant. In tuple 
notation, the value is written as a sequence, in parentheses, of comma-separated 
symbols. 

For examples, z y x and 43 210 are in string notation, {x,y,z) and (0, 1,2,3,4) 
are in tuple notation. 

The string notation indicates high-first order: the first (i.e. leftmost) symbol 

denotes the most significant part of the value when it is interpreted as a number. 

Conversely, the tuple notation indicates low- first order: the first symbol denotes 
the least significant part of the value when it is interpreted as a number. 

For examples, to interpret a 3-word number, X2 denotes the most significant 
word of X2X1X0 and a;o denotes the least significant word of {xo,xi,X2)- 

The same value may appear in either notation. Thus, for example, for every a, 
b, c and d, 

abcd= {d,c,b, a) 

The term part introduced above usually refers to "word", but it may also refer to 
"digit" [of a number], "component" [of a tuple or vector], as well as group thereof. 
If, for example x, y, z, t are 1-digit, 2-digit, 3-digit and 4-digit values respectively, 
then (x, y, z, t) = 9876543210 means x = 0, y = 21, z = 543 and f = 9876. 

Note that the "string notation" and "number notation" being used as synonyms 
does not mean that big-endian data ordering is mandated. In order to avoid security 
irrelevant details, we do not specify cndiancss. We nevertheless provide a "reference" 
implementations in C++, where every octet string is considered as a [generally 
multi-word] number with the first octet taken as the least significant one. The 
implementation thus interprets octet strings as numbers in little-endian order. 

2.3. Operations on word strings. Let (.)^ denote the permutation that reverses 
the word order of a non-empty word string. For example, for w = 8, 

0x0123ABCD^ = 0xCDAB2301 

Let (.)^ denote the permutation that swaps the high order and low order halves 
of every word of a non-empty word string. For example, for w = 8, 

0x0123ABCD'=* = 0xl032BADC 
The operator ® on word strings denote word-wise application of ®. For example, 

(ao,ai,a2, ...) © (60,61,62, ...) = (ao © 60, ai © 61, a2 © 62...) 

Unless otherwise said, operators ffl and B on word strings denote word-wise 
modular addition and subtraction, respectively. For example, 

(ao, ai, 02, . . .) EB (60, 61, 62, . . .) = (ao EB 60, ai ffl 61, 02 ffl 62, • • •) 

{ao,ai,a2, . . .) B (60,61,62, . . .) = (ao B 60, ai B 61,02 B 62, . . .) 

Let (.) denote the word- wise application of the inversion operator (.) on a word 
string. For example. 



(a,6,c, ...) = (a,6,c, ...) 
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Given word strings E and X of the same length, let y denote the word-wise 
application of the inversion operator ^ on every word a; of X with the index- 
matching word of E taken as the [right] unit element e. For example, 

(ei, 62,63, ■ . ■) _ £1 £2 £3 \ 
(xi,2;2,a;3, . . .) \xi' X2' " ' J 
Operations on word strings are used in this article only to express the decryption 
function explicitly. 

3. Specification 

This section provides details of NSABC/ w. From now on w, the word length, 
must be even. 

Throughout this article, X denotes a 4w;-bit plaintext block, Y a Aw-hit cipher- 
text block, Z a Sw-bit key, T a 4w-bit secret tweak, i.e., a value that is used to 
encrypt only one block under the key, U a w-bit unit key, i.e. an additional key 
that generates right unit elements for the underlying quasi-groups. 

Tweaking is optional. It may be disabled by keeping T constant (like Z and 
U) while encrypting many blocks. When tweaking is disabled, NSABC becomes a 
conventional, non-tweakable, block cipher. 

Mathematically, the cipher is given by two functions, 
ENCRYPT(X, Z,T, U), which encrypts X under control of Z, T and U, 
DECRYPT(r, Z,T, U), which decrypts Y under control of Z, T and U, 
satisfying the apparent relation 

DECRYPT (ENCRYPT (X, Z, T, U), Z, T,U) = X 
The function ENCRYPT is defined in terms of four functions: 
CRYPT, a text encryption function that encrypts a plaintext block using a key 

schedule, a unit schedule and a tweak schedule; 
KE, a key expansion function, that generates the key schedule from Z; 

UE, a unit element function, that generates the unit schedule from U ; and 

TE, a tweak expansion function, that generates the tweak schedule from T. 



Algorithm 1 Function ENCRYPT 



Input: 




X 


4«;-bit plaintext block 


Z 


5 w-bit key 


T 


4w;-bit tweak 


U 


w-bit unit key 


Output: 




Y 


4w-bit ciphertext block 


Relation: 





encrypt(a:, z, t, u) = crypt(x, ke(z), ue([/), TE(r)) 



An explicit relation for ENCRYPT is given in Algorithm [TJ An explicit relation 
for DECRYPT is given in Algorithm El 

Mechanically, encryption is performed on a conceptual processor with a 4-word 
text register {xo,Xi,X2,x^), a 5-word key register {zq, zi, Z2, Z3, Z4), a 4-word tweak 
register (to 1^11^2,^3) and a word unit register u. The key register is initially loaded 
with the key Z. The tweak register is initially loaded with the tweak T. The unit 
register is initially loaded with the unit key U. The text register is initially loaded 
with the plaintext block X and finally it contains the ciphertext block Y. 
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Figure 3.1. Representative rounds. 



The concrete, vector, notation here specifies the order of words so, for example, 
xq is initially loaded with the least significant word of X and finally it contains the 
least significant word of Y. 

3.1. Text encryption. The text register (xq, xi, 0:2, xa) is initially loaded with the 
plaintext block X and finally it contains the ciphertext block Y. 

Text encryption proceeds in 32 rounds of operations. A round is of either type 
A or type B. The rounds are arranged in four passes: firstly eight rounds of type A, 
then eight rounds of type B, then eight rounds of type A again, finally eight rounds 
of type B again. 

For fc-th round, < A; < 31, the text word xq is permuted, i.e. it is updated by 
an execution unit called G-box that implements a permutation G on the set of word 
values, and the contents of the text word xq are mixed, by exclusive-or (XOR), into 
an other text word that is either xi or x^. The order of operations and the target 
of mixing depend on the round type: 

• For an A- typed round (see Figure [5T] part A), G applies first, then the 
mixing takes place and targets xi. That is, the contents of xq enters the G- 
box, the output value of the G-box is stored back to xq, then the contents 
of xo and xi are XOR'ed and the result is stored to xi. The words X2 and 
xz are left unchanged. 

• For a B-typed round (see Figure 13.11 part B) . the mixing takes place and 
targets ^3 first, then G applies. That is, the contents of xq and x^ are 
XOR'ed and the result is stored to xs, then the contents of xq enters the 
G-box, the output value of the G-box is stored back to xq. The words xi 
and X2 are left unchanged. 

Besides the text input, the G-box also takes as its inputs an ordered pair of 
w-bit key words {K^kiK^k+i) (depicted by < in Fig. 13.11 and l3.2p . an ordered pair 
of w-bit unit words (L2fc, -^^2fc+i) (depicted by in Fig. 13.11 and 13. 2p . and a w-h\t 
tweak word Ck (depicted by d in Fig. 13.11 and l3.2p . The details on how key words, 
unit words and tweak words are generated and used will be given in the subsequent 
subsections. 
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Algorithm 2 Function CRYPT (text encryption) 



4w-bit plaintext block 
64M;-bit key schedule 
64w-bit unit schedule 
32w-bit tweak schedule 



Input: 

X 
K 
L 

C 

Output: 

Y 4w-bit ciphertext block 
Pseudo-code: 

(a;o,a;i,X2,X3) ^ X 

for fc^ 0,1, 2,..., 31 loop 

if 0<fc<8 V 16<A;<24 then 

xq ^ G(xo, {K2k, K2k+i), {L2k, L2k+i), Ck) 

Xi <— Xi © Xq 

elsif 8 < A; < 16 V 24 < A; < 32 then 

2^3 <- 2^3 © 2^0 

xo G{xa. {K2k, K2k+i), {L2k, L2k+i),Ck) 
end if 

{xa,xi,x2,X3) <- (a;i,a;2,a;3,a;o) 
end loop 

Y -s- (a;o,a;i,a;2,a;3) 
Relations: 



Y = (4 



(32) (32) (32) (32) 



For ^ fc < 8 V 16 < A; < 24: 



Jk) 



— 3Cn 



= 9 



(fe) 

!=3 
(fc) 



For 8 ^ fc < 16 V 24 < A; < 32: 

Ak+l) _ (k) 
Xq — 

(fe+i) _ (fe) 

(fe+i) _ (fc) 

■^2 — ■^3 

(fc+1) 



X. 



-,ik) 



For < A; < 32 : 



Jk) 



G{xq''\ {K2k, K2k+i), {L2k,L2k+i),Ck) 



(AO) (0) (0) (0)^ _ ^ 
(Xq , , X2 , X3 ; — yv 

(if0,ifl,^2,...,if63)=i^ 

(Lo, 1/1,1/2, • • • , -C/es) = 
(Co, Ci, C2, . . . , C31) = C 



The encryption round is completed with a rotation by one word toward the 
least significant word on the text register, i.e. the text register is modified by 
simultaneous loading the word xq with the contents of the word xi, x\ with the 
contents of X2, X2 with the contents of X3, and X3 with the contents of xq. 

3.2. Tweak schedule. The tweak register (tg, ^1,^2,^3) is initially loaded with the 
tweak T. The tweak words are generated in 32 rounds of operations. 

For A;-th round, < A; < 31, the value of the word of the tweak register is taken 
as the tweak word Ck [which enters the G-box in the A;-th encryption round]. Then, 
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Algorithm 3 Function TE (tweak expansion) 



Input: 

T 4w-bit tweak 

Output: 

C 32?/;-bit tweak schedule 

Pseudo-code: 

for fc ^ 0, 1,2, ... ,31 loop 
Ck ^ to 

(io, ^1, ^2, is) ^ (^1,^2,^37^0) 
end loop 

Relations: 

C = (Co, Ci, C2, . . . , C31) 
For ^ fc < 32: 

Ck — if) 

(fc+i) _ (fe) 

Lq — 

(fe+1) _ (k) 

'■l ^ '-2 

^(fc+1) ^ ^(fe) 

^(fc+1) ^ ^(fe) 

(4°l4°\4°°,4°)) = r 



similarly to the text register, the tweak register is rotated by one word toward the 
least significant word (see Figure [XT]) . 

NOTE. For T3T2T1T0 = T, the tweak schedule is 

TE(T) = (To, Ti, r2, T3, To, Ti, T2, Tg, . . . , To, Ti, T2, T3) 

3.3. Key schedule. The key register {zq, zi, Z2, z^, Z4) is initially loaded with the 
key Z. The key words are generated in 64 rounds of operations. 

For A;-th round, < A; < 63, the value of the word Z3 of the key register is taken 
as the key word Kk [which enters the G-box of the k /2-th encryption round as the 
first key word if k is even, or as the second key word if k is odd]. The key register is 
then rotated by one word toward the least significant word. The rotation is similar 
to that on the text register and the tweak register. (See Figure I^TTl ) 

NOTE. For Z4Z3Z2Z1Z0 = Z, the key schedule is 

KFi{Z) = (Z3, Z4, Zq, Zi, Z2, Z3, Z4, Zq, Zi, Z2, . . . , Z3, Z4, Zq, Zi) 

3.4. Unit schedule. The unit register u is initially loaded with the unit key U. 
Unit words are generated in 64 rounds of operations. 

For k-th round, < fc < 63, the value of the unit register u is taken as the 
unit word Lk [which, similarly to the key word K^, enters the G-box of fc/2-th 
encryption round as the first unit word if k is even, or as the second unit word if 
k is odd]. The register is then added modulo 2™ by the [key-dependent] constant 
2/7 ffl 1 to become ready for the next round. (See Figure IXTl ) 

NOTE. Given unit key U, the unit schedule is 



UE([/) = {U, 3C/ ffi 1, 5f7 ffl 2, 7C/ ffl 3, . . . , 127/7 ffl 63) 
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Algorithm 4 Function KE (Icey expansion) 

Input: 

Z 5w-bit key 

Output: 

K 64?i'-bit key schedule 

Pseudo-code: 

(zo,2;i, 2:2,2:3, -24) Z 

for fc^ 0,1, 2,..., 63 loop 

Ku ^ 2:3 

(20,21,22,23,24) ^ (21,22,23,24,20) 
end loop 

Relations: 

For ^ fc < 64: 

(fe+i) „ (fe) 

Zq — 

(fc+1) _ (fe) 

-'I — -^2 

(fe+1) _ (fe) 

-'2 — -^3 

(fe+1) _ (fe) 
Z3 — Z4 

^(fe+1) ^ ^(fe) 

C jot JO) J0°) (0) (0)^ _ y 

' ^1 ' ^2 , ^3 ' ^4 I ^ ^ 



Algorithm 5 Function UE (unit element) 

Input: 

\J w-bit unit key 

Output: 

L 64 w-bit unit schedule 

Pseudo-code: 

for fc 0,1,2,...,63 loop 

ife ^ U 

M^Mffl2C/ffll 

end loop 
Relations: 

^ = (^0, Li,L2, . . . , ies) 
For fc = 0,l,2,...,63: 

Lfe = hC^) 

y(fe+i) ^ y(fe) H 2[/ffl 1 

Or, equivalently , for every k: 

uV") ^UGk 



3.5. G-box. The G-box implements a permutation G (see Figure that takes 
as argument a text word and is parametrized by an ordered pair of key words 
{Kq,Ki), an ordered pair of unit words (Lo,Li) and a tweak word Co to return 
a text word as the result. The G-box operates on a word register that initially 
contains the argument and finally contains the result. The G-box proceeds in two 
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argument 




result 



Figure 3.2. Permutation G. 
rounds, each consisting of an operation □ followed by a half-word swap S. The two 

e 

rounds are separated by an exclusive-or (XOR) operation. 

For the first round, the operation □ takes the eontcnts of the register as the left 

operand, A'o as the right operand, and Lq as its right unit element. The result 
is stored back to the register. The register is then modified by operation S, i.e. 
swapping the contents of its high and low order halves. 

For the inter-round XOR operation, the register is modified by XOR'ing its 
contents with the tweak word Co and storing the result back to it. 

For the second round, the register is processed similarly to the first round with 
Ki and Li being used instead of Kq and Lq, respectively. 
NOTES. 

(1) The cipher uses 64 distinct instances from the family of operations □. 

e 

(2) Alternatively, it may be seen as using 64 identical instances of the single 
operation □ or 0, but operands and result of each instance are "biased" 
by adding or subtracting the constant Lq (or Li) that is specific to the 
instance, and furthermore, being seen as ©, the left operand enters and the 
result leaves it in altered sign. 

(3) Like Skipjack, the G-box permutes Kq (or Ki) while keeping x and other 
parameters fixed. Unlike Skipjack, the G-box doesn't permute the word 
{Ri{Kii),ljo{Ki)) where Hi(.) and Lo(.) stand for the high and the low 
order half respectively. 

(4) Unlike Skipjack, diffusion in the G-box is incomplete, i.e. not every input 
bit affects all output bits. Indeed, the v-th bit of the argument, with 
V > w/2, affects only all bits of the low order half and bits v through w — 1 
of the result; bits w/2 through v — 1 remain unaffected. 

(5) If {K(),Ki) = {Lq, Li) A Gq = then G becomes the identity. 

3.6. Decryption. Decryption can be easily derived from encryption. Namely, if 



Y = CRYPT(X,KE(Z),UE(C/),TE(r)) 
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Algorithm 6 Permutation G 



Input: 

X 

(io,ii) 
Co 
Output: 

y 

Pseudo-code: 

X X E\ Kq 

s 



w-bit text word 
pair of w-hit key words 
pair of w-hit unit words 
w-hit tweak word 

w-hit text word 



X X' 
X <— X 



> Co 
X ^ X El Ki 

Q 

y X 
Relation: 

G{x,{Ko,Ki),{Lo,L^),Co) 



{{{x □ Kof © Co) □ i^i 

Lo Li 



Algorithm 7 Function DECRYPT 



Input: 




Y 


4w;-bit cipliertext block 


Z 


Sw-bit key 


T 


4w-bit tweak 


U 


w-bit unit key 


Output: 




X 


4w-bit plaintext block 



Relation: 



DECRYPT(r,Z,r, U) = CRYPT(y"'^,— )-^,UE(U)'^,TE(r^^)) 



then it immediately follows that 

CRYPT(ri^s^ ^|(|)!.,UE(U)«,TE(r«S)) = 

In other words, encrypting the cipher block in reverse half-word order (y^^) 
using the tweak in reverse half-word order (T^^), the unit schedule in reverse word 
order (UE(C/)^ ), and the key schedule consisting of inverse words of one expanded 
from the key in reverse word order (Z^), where the inversions are of the quasi- 
groups defined by the operation □ and each quasi-group is uniquely given by its 

e 

right unit that is the index-matching word of the encryption unit schedule in reverse 
word order (UE([/)^), recovers the plain block in reverse half-word order (X^^). 
NOTES. 

(1) The full cipher is illustrated in Figure^ where X3X2X1X0 = X, Y3Y2Y1Y0 
r, Z4Z3Z2Z1Z0 = Z and T3T2T1T0 = T (the unit schedule is omitted). 
The figure is obtained by "unrolling" (i.e. eliminating all rotations of) the 
dataflow graph of the full cipher that would be obtained by cascading the 
individual rounds as in Figure [5TT1 
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(2) The overall structure is up to word indexing identical to that of Skipjack. 
The word re-indexing, which is cryptographically insignificant, was intro- 
duced to ease description and illustration. 

(3) Like Skipjack, decryption is similar to encryption. To decrypt with Skip- 
jack, one swaps adjacent words in the cipher and the plain block and swaps 
adjacent word pairs in the key. To decrypt with NSABC, one reverses the 
word order, i.e., swaps the first and the last words as well as the second 
first and the second last ones in the text block, the tweak and the key. 
For Skipjack, one also swaps high and low order halves of every word. For 
NSABC, one swaps high and low order halves of every word but that of the 
key 

(4) Unlike Skipjack, just swapping the words and half-words doesn't turn en- 
cryption into decryption - one needs to invert key words too. Thus although 
ENCRYPT and DECRYPT can be expressed explicitly in terms of CRYPT, 
DECRYPT cannot be expressed exphcitly in terms of ENCRYPT. 

3.7. Tweak derivation. The 4w-bit secret tweak T is used to encrypt only one 
block [under a given key Z and unit key [/]. In order to encrypt multiple blocks the 
tweak is derived from the block index and a 4?i;-bit [additional] key, called tweak 
key, as follows. Let T^^^ denote the tweak used to encrypt j-th block. For the first 
block {j = 0), the tweak key is used as the tweak directly: 

t(°' = tweak key 

The subsequent tweak is computed from the current tweak by the recurrent 
relation: 

yO+i) = tU) h 2T(°) ffl 1 

or, equivalently, 

— y(o) j 

where all operands are regarded as 4w-bit numbers and all operators are defined 
on 4ti;-bit arithmetic, i.e. mod 2''™. 
NOTES. 

(1) The third relation, where r*-"-* conveniently designates the [unnamed] tweak 
key, is meant for random access. The family of functions {T : j H> T^°) Q j}, 
parametrized by the tweak key T'"', is not e-almost 2-XOR universal ac- 
cording to definition in [LRW02) . Eventual application of this family in 
the Liskov-Rivest- Wagner construction, i.e. encryption by CRYPT (X © 
T(^),KE(Z),0,UE(C/)) ©T(j), is therefore impossible. 

(2) For efficient random access, applications may opt to use non-flat spaces 
of the block index j. For example, an application that encrypts relational 
databases may define the index in the format j — j4 j3 j2 ji jo , where j4 is 
database number, j3 is table number within the database, j2 is row number 
within the table, ji is field number within the row and jg is block number 
within the field. 

(3) Tweaking must be disabled when the cipher is used as a permutation, i. e. 
to generate a sequence of unique numbers. 

(4) Tweaking should be enabled in all other modes of operation. For exam- 
ple, a non-tweakable block cipher can generate a sequence of independent 
numbers by encrypting a counter block in Cipher Block Chaining (CBC) 
mode; NSABC can generate a similar sequence with virtually the same 
cycle length by encrypting a constant block in a "tweaked CBC" mode. 
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X2 Xi Xq 



jr (2) 



■ (3) 

J^-^^Z^Z^ 



♦ TO 

7o^O^ Z.Z, 



X<f>) 

T2 ^(>i-ZoZ, 



Z2Z3 



T A® 7 7 
-'O — kQ-<— Z4Z0 



f(10) 
r2^<>-Z3Z4 



r,^(*,i5^z,z. 



• (11) 

Ta-i-O^ZoZi 



71, -"-p^ 22^3 



Ti — kS-"— -^4 Zo 



r.^rz,z. 



T- X<15)v 7 

^3— >-rx— Z3Z4 



^3- 



t (16) 

7„ ^O^ZoZi 



i (17) 

7-, ^<>-Z2Z3 



72^0'-^Z4Zo 



-Z.Z, 



W^)2^^Z3Z4 



1(21) 

Ti^O-^ZoZ, 



t(22) 
r2^CH-Z2Z 



r„^(5-^ZiZ2 




, (28)^ V 

'0— K><— 



Figure 3.3. The full cipher, by "unrohmg". 
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4. Example 
An encipherment in NSABC/16 with 

X = 0x0123456789ABCDEF 
Z = 0x88880777006600050000 
T = 0x0001002203334444 
U = 0x1998 

results in 

Y = 0x88B14E700F51921E 

Table [T] lists states of the [conceptual] processor during the encipherment, i.e. 
the contents of all registers at the start of round k for k = 0, 1, 2, 64 for key 
schedule and unit schedule, and k = 0, 1, 2, 32 for tweak schedule and text 
encryption. The start of round 64 (32) conveniently means the end of round 63 
(31), which is that of the entire algorithm. 

5. Notes on implementation 

This section provides methods for efficient software implementation for two types 
of environment: memory-constrained, such as embedded computers, and memory- 
abundant, such as servers and personal computers. 

5.1. Memory-constrained environment. The function ENCRYPT can be im- 
plemented without using any writeable memory on a processor with at least 16 
word registers: 

• 4 for {xq,xi,X2,X3) - text register 

• 5 for (zq, zi, Z2, 23, Z4) - key register 

• 4 for {to,ti,t2,t3) - tweak register 

• 1 for u - unit register, 

• 1 for the constant value 2U ffl 1, and 

• 1 for k - round index. 

Indeed, the schedules K, L and C can vanish because every word of them, once 
produced, can be consumed immediately, provided that the functions KE, TE, 
UE and CRYPT are programmed to run in parallel and synchronized with each 
increment of k. Source code of this implementation is given in Appendix A. 

Unlike ENCRYPT, DECRYPT needs memory for the key schedule because on- 
the-fly modular multiplicative inversion is too slow to be practical. In this envi- 
ronment, modes of operation that avoid DECRYPT (i.e. ones using ENCRYPT to 
decrypt) are thus preferable. 

5.2. Memory-abundant environment. The quasi-group operation □ can be 

e 

evaluated by only one multiplication and one addition. Indeed, 

X □ z = mx ffl n 

e 

where a; is a text word, z key word, and 

m 2(z - e) ffl 1 

n = (2e - 1) S (z - e) 

So, instead of using the (z,e) pairs, one may pre-compute the (m,n) pairs once 
and use them many times. 
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Table 1. Processor states during an encipherment by NSABC/16 





Unit 


Ksy rsgistsr 




TwScik rsgister 


Tsxt rsgistsr" 


k 


u 


z4 z3 z2 zl zO 




t3 t2 tl to 


x3 x2 xl xO 





1998 


88880777006600050000 





0001002203334444 


0123456789ABCDEF 


1 


4CC9 


00008888077700660005 








2 


7FFA 


00050000888807770066 


1 


4444000100220333 


388401234567B12F 


3 


B32B 


00660005000088880777 








4 


E65C 


07770066000500008888 


2 


03334444000 10022 


1E9038840 1235BF7 


5 


198D 


88880777006600050000 








6 


4CBE 


00008888077700660005 


3 


0022033344440001 


60AC1E903884618F 


7 


7FEF 


00050000888807770066 








8 


B320 


00660005000088880777 


4 


0001002203334444 


499 1 60AC 1E907 115 


g 


E651 


07770066000500008888 








10 


1982 


88880777006600050000 


5 


4444000100220333 


C2D7499 1 60ACDC47 


11 


4CB3 


00008888077700660005 








12 


7FE4 


00050000888807770066 


6 


0333444400010022 


F1EFC2D749919143 


13 


B315 


00660005000088880777 








14 


E646 


07770066000500008888 


7 


0022033344440001 


03D2F1EFC2D74A43 


15 


1977 


88880777006600050000 








16 


4CA8 


00008888077700660005 


8 


0001002203334444 


273D03D2F1EFE5EA 


17 


7FD9 


00050000888807770066 








18 


B30A 


00660005000088880777 


9 


4444000100220333 


1615C2D703D2F1EF 


19 


E63B 


07770066000500008888 








20 


196C 


88880777006600050000 


10 


03334444000 10022 


A9B6E7FAC2D703D2 


21 


4C9D 


00008888077700660005 








22 


7FCE 


00050000888807770066 


11 


0022033344440001 


18C0AA64E7FAC2D7 


23 


B2FF 


00660005000088880777 








24 


E630 


07770066000500008888 


12 


0001002203334444 


B049D A 1 7AA64E7FA 


25 


1961 


88880777006600050000 








26 


4C92 


00008888077700660005 


13 


4444000100220333 


851857B3DA17AA64 


27 


7FC3 


00050000888807770066 








28 


B2F4 


00660005000088880777 


14 


03334444000 10022 


71F82F7C57B3DA17 


29 


E625 


07770066000500008888 








30 


1956 


88880777006600050000 


15 


0022033344440001 


D5F0ABEF2F7C57B3 


31 


4C87 


00008888077700660005 








32 


7FB8 


00050000888807770066 


16 


0001002203334444 


E5118243ABEF2F7C 


33 


B2E9 


00660005000088880777 








34 


E61A 


07770066000500008888 


17 


4444000100220333 


94FAE51182433F15 


35 


194B 


88880777006600050000 








36 


4C7C 


00008888077700660005 


18 


0333444400010022 


7BB394FAE5 1 1F9F0 


37 


7FAD 


00050000888807770066 








38 


B2DE 


00660005000088880777 


19 


0022033344440001 


11747BB394FAF465 


39 


E60F 


07770066000500008888 








40 


1940 


88880777006600050000 


20 


0001002203334444 


D14F11747BB345B5 


41 


4C71 


00008888077700660005 








42 


7FA2 


00050000888807770066 


21 


4444000100220333 


0385D14F11747836 


43 


B2D3 


00660005000088880777 








44 


E604 


07770066000500008888 


22 


03334444000 10022 


873B0385D14F964F 


45 


1935 


88880777006600050000 








46 


4C66 


00008888077700660005 


23 


0022033344440001 


CB9B873B0385 1 AD4 


47 


7F97 


00050000888807770066 








48 


B2C8 


00660005000088880777 


24 


0001002203334444 


D6FCCB9B873BD579 


49 


E5F9 


07770066000500008888 








50 


192A 


OOOOV ( / ( \J\J\J\J\J\J\J\J\J\J\J\J 


25 


4.44.4nnni 00990^"^^ 




51 


4C5B 


00008888077700660005 








52 


7F8C 


00050000888807770066 


26 


0333444400010022 


779F53F40385CB9B 


53 


B2BD 


00660005000088880777 








54 


E5EE 


07770066000500008888 


27 


0022033344440001 


8CECBC0453F40385 


55 


191F 


88880777006600050000 








56 


4C50 


00008888077700660005 


28 


0001002203334444 


C93A8F69BC0453F4 


57 


7F81 


00050000888807770066 








58 


B2B2 


00660005000088880777 


29 


4444000100220333 


2E1A9ACE8F69BC04 


59 


E5E3 


07770066000500008888 








60 


1914 


88880777006600050000 


30 


0333444400010022 


803892 1E9ACE8F69 


61 


4C45 


00008888077700660005 








62 


7F76 


00050000888807770066 


31 


0022033344440001 


D4BE0F51921E9ACE 


63 


B2A7 


00660005000088880777 








64 


E5D8 


07770066000500008888 


32 


0001002203334444 


88B14E700F51921E 
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The cipher is parallehzable. The following procedure executes all 32 rounds in 
20 steps, of which half performing two or three parallel evaluations of G. Recall 
that g^*^^ is the result of G in round k. 



(1) 


Compute 








(2) 


Compute 


(1) 






(3) 


Compute 


(2) 






(4) 


Compute 


(3) 

9^ ' 






(5) 


Compute 


(41 

9^ ' 






(6) 


Compute 


9^ 


gill) 


n parallel 


(7) 


Compute 


(6) 

9^ 


g^'^' in parallel 




Compute 


9^ 




gim jjj parallel 


(9) 


Compute 


9^'\ 


gili) 


n parallel 


(10) 


Compute 


g(12) 


g(15) 


in parallel 


(11) 


Compute 








(12) 


Compute 


gin) 






(13) 


Compute 


gim 






(14) 


Compute 


gim 






(15) 


Compute 


gim 






(16) 


Compute 


gi21) 


gi27) 


in parallel 


(17) 


Compute 


gi22) 


g(25) 


in parallel 


(18) 


Compute 


gim 


g(26) 


, g*-^®-* in parallel 


(19) 


Compute 


gi24) 


gim 


in parallel 


(20) 


Compute 


gi2S) 


g(31) 


in parallel 



The procedure becomes evident by examining the dataflow graph of the cipher, 
shown in Figure 15.11 which is obtained by "unrolling" the one in Figure 13.31 Here 
"unrolling" means introducing a rotation so that the G-boxes with congruent round 
indices (mod 3) lay on a straight line. 

On a x86-64 processor in 32-bit mode [w — 32), the procedure takes about 256 
clock cycles, i.e. 256/16 = 16 clock cycles per byte encrypted. (The source code 
of this implementation is given in Appendix B.) In 64-bit mode {w = 64), it takes 
about 384 clock cycles, i.e. 384/32 = 12 clock cycles per byte. 

The procedure may be also coded twice, i.e. it may be run in two instances in 
parallel on a single core of the processor, with the second instance delayed by a few 
steps after the first, to encrypt two blocks possibly under different tweaks and/or 
keys. This method has shown to be effective for x86-64 processors in 64-bit mode, 
resulting in about 9 clock cycles per byte. 

6. Conclusion 

We defined NSABC, a block cipher utilizing a group operation that is essentially 
modular multiplication of machine words, a powerful operation available on many 
processors. 

NSABC was meant to be elegant. It uses no S-boxes or "magic" constants. It uses 
only machine word-oriented algebraic operations. It makes use of the simple and 
regular structure of Skipjack which has become publicly known for over a decade 
— sufficient time to be truly understood. It is elegant to be easily memorizable, 
realizable and analyzable. 

NSABC bases on some valuable design of a well-reputed agency in the branch. 
We therefore believe that it is worth analysis and it can withstand rigorous analysis. 
If this happens to be true, then we may have a practical cipher with 256-bit blocks, 
allowing to encrypt enormous amount of data under the same key, and with 320-bit 
keys, allowing to protect data over every imaginable time. 
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@ 


Z1Z2 


@ 


Z2Z3 


@ 


Z3Z4 


@ 






To 




Ti 




T2 







Figure 5.1. The full cipher, by another "unrolhng". 



In cipher design there is always a trade-off between security and efficiency, and 
designers always have to ask: "What do we want, a very strong and fairly fast 
cipher, or fairly strong but very fast?" 

NSABC reflects the authors' view on the dilemma. If Skipjack is regarded as very 
strong and just fairly fast, then NSABC may be regarded as a design emphasizing 
the second aspect — make it very fast, abeit just fairly strong. For w;-bit word 
length, NSABC key length is 5w bits, optionally plus 5w bits more, whilst the true 
level of security is yet to be determined. On the other hand, on a modern 64-bit 
processor it takes only 9 clock cycles to encrypt a byte. 

NSABC is thus fast to be comparable to every modern block cipher. 
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APPENDIX A. A reference implementation of NS ABC/32 — ENCRYPT only 

1 typedef uint32_t word; 

2 // 

3 static word o(word x, word y, word e) 

4 { 

5 return 2*x*y + (1 - 2*e)*(x - y + e); 

6 } 

7 // 

8 static word G(word x, word zO, word zl, word uO, word ul, word t) 

9 { 

10 X = o (x , zO , uO) ; 

11 X = _rotl (x , 16) ; 

12 X -= t; 

13 x=oCx,zl,ul); 

14 X = _rotl (x ,16) ; 

15 return x; 

16 } 

17 // 

18 void encrypt ( word Y [4] , word const X [4] , word const Z [5] , 

19 word const T [4] , word U ) 

20 { 

21 word 

22 xO = X[0], xl = X[l], x2 = X[2], x3 = X [3] , 

23 zO = Z[0], zl = Z[l], z2 = Z[2], z3 = Z [3] , z4 = Z [4] , 

24 to = T[0], tl = T[l], t2 = T[2], t3 = T [3] , 

25 uO = U, 

26 ul = uO + 2*U + 1; 

27 fordnt k = 0; k < 32; k + + ) 

28 < 

29 if(k & 8) // B-round 

30 { 

31 x3 -= xO; 

32 xO = G(xO, z3 , z4, uO , ul , tO); 

33 } 

34 else // A-round 

35 { 

36 xO = G(xO, z3 , z4 , uO , ul , tO); 

37 xl -= xO; 

38 } 

39 word X = xO; xO = xl ; xl = x2; x2 = x3 ; x3 = x ; 

40 word z = zO; zO = z2 ; z2 = z4 ; z4 = zl; zl = z3; z3 = z ; 

41 word t = tO; tO = tl ; tl = t2; t2 = t3; t3 = t; 

42 uO = ul + 2*U + 1; 

43 ul = uO + 2*U + 1; 

44 } 

45 Y[0] = xO; Y[l] = xl; Y [2] = x2 ; Y [3] = x3 ; 

46 } 

APPENDIX B. An optimized implementation of NSABC/32 

1 void expandkey (word M [64] , word N [64] , word const Z [5] , word U) 

2 { 

3 word zO = Z[0], zl=Z[l], z2 = Z [2] , z3 = Z [3] , z4 = Z [4] ; 

4 word u = U; 
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10 

11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 



70 
71 
72 
73 
74 



f or( 
{ 



int k=0; k<64; k++ ) 



M[k] = 2*(23 - u) + 1; 
N[k] = (2*u - l)*(z3 - u) ; 
u += 2*U + 1; 
word z=zO; zO=zl ; zl=z2 



z2=z3; z3=z4; z4=z; 



} 

II-- 
st at 
word 
< 



ic inline 

G( word X, word t, word mO , word ml, 

X *= mO ; 
X += nO ; 

X = _rotl (x , 16) ; 
X -= t ; 
X *= ml ; 
X += nl ; 

X = _rotl (x , 16) ; 
return x; 



word nO , word nl ) 



} 

II--- 
void 

{ 



crypt ( word Y [4] , word const X [4] , 
word const M [64] , word const 



go 



// Step 1 
word const 
// Step 2 
word const gl = 
// Step 3 
word const g2 = 
// Step 4 
word const g3 = 
// Step 5 
word const g4 = 
// Step 6 
word const g5 = 
word const gll= 
// Step 7 
word const g6 = 
word const g9 = 
// Step 8 
word const g7 = 
word const glO = 
word const gl3 = 
// Step 9 
word const g8 = 
word const gl4 = 
// Step 10 
word const gl2 = 
word const gl5 = 
// Step 11 
word const gl6 = 
// Step 12 
word const gl7 = 
// Step 13 
word const gl8 = 
// Step 14 
word const gl9= 
// Step 15 
word const g20 = 
// Step 16 
word const g21 = 
word const g27= 
// Step 17 
word const g22 = 
word const g25= 
// Step 18 
word const g23 = 
word const g26= 



G(X [0] , 

G(X [1] -gO , 

G(X [2] -gl , 

G(X [3] -g2 , 

Cl(g0-g3 , 

G(gl-g4, 
G(g4, 

G(g2-g5, 
G(g5, 

G(g3-g6 , 
G(g6, 
G(g6-g9 , 

G(g4-g7, 
G(g4-glO, 

G(g5-g8 , 
G(g5-g8-gll , 

G(g6-g9-gl2 , 

G(g4-gl0-gl3-gl6, 

G(g5-g8-gll-gl4-gl7 , 

G(gl5-gl8, 

G(gl6-gl9 , 

G(gl7-g20, 
G(g20 , 

G(gl8-g21, 
G(g21 , 

G(gl9-g22, 
G(g22 , 



word const T [4] 
N[64]) 



T [0] 

T[l] 

T [2] 

T [3] 

T [0] 

T[l] 
T[3] 

T[2] 
T[l] 

T[3] 
T[2] 
T[l] 

T[0] 
T[2] 

T[0] 
T[3] 

T[0] 

T[l] 

T[2] 

T[3] 

T[0] 

T[l] 
T[3] 

T[2] 
T[l] 

T[3] 
T[2] 



M [0] 


, M[l] 


, N [0] 


, K [1] ) ; 


, M [2] 


, M[3] 


, N [2] 


, N[3]); 


M [4] 


M [5] 


N [4 


N [5] ) ■ 


M [6] 


, M[7] 


, N [6] 


, N[7]); 


M [8] 


, M [9] 


, N [8] 


, N[9]); 


, M[10] 
, M[22] 


,M[11] 
,M[23] 


,N [10] 
, N[22] 


,N[11]) ; 
, N [23] ) ; 


, M[12] 
, M[18] 


,M[13] 
,M[19] 


, N [12] 
, N [18] 


,N[13]) ; 
, N [19] ) ; 


, M[14] 
, M[20] 
, M[26] 


,M[15] 
,M[21] 
,M[27] 


N [14] 
N [20] 
K [26] 


, N [15] ) ; 
, N [21] ) ; 
, N [27] ) ; 


, M[16] 
, M[28] 


,M[17] 
,M[29] 


, N [16] 
, N[28] 


,N[17]) ; 
, N [29] ) ; 


, M[24] 
, M[30] 


,M[25] 
,M[31] 


, H[24] 
, N [30] 


, N [25] ) ; 
,N[31]) ; 


, M[32] 


,M[33] 


,N [32] 


, N [33] ) ; 


, M[34] 


,M[35] 


, N [34] 


, N [35] ) ; 


, M[36] 


,M[37] 


N [36] 


, N [37] ) ; 


, M[38] 


, M [39] 


N [38] 


, N [39] ) ; 


, M[40] 


,M[41] 


, N [40] 


, N [41] ) ; 


, M[42] 
, M [54] 


,M[43] 
,M[55] 


, N [42] 
, N[54] 


,N[43]); 
, N [55] ) ; 


■ M[44] 

■ M[50] 


,M[45] 
■ M[51] 


, N[44] 
■ N[50] 


, N [45] ) ; 
, N [51] ) ; 


■ M[46] 
, M[52] 


,M[47] 
,M[53] 


, N[46] 
, N[52] 


,N[47]); 
, N [53] ) ; 
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75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 



90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 



word const g29= G(g22-g25, 
// Step 19 

word const g24= G(g20-g23, 
Hord const g30= G(g20~g26, 
Y[l] = g20-g26-g29; 
// Step 20 

word const g28= G(g21-g24, 
word const g31= G ( g2 1 " g24~g27 , 
Y[2] = g21-g24-g27-g30; 
// Step 21 

Y[0] = g22-g25-g28; 
Y[3] = g31; 

} 

// 

// Multiplicative inverse of x (mod 2**32) , x 
// Source code by Thomas Pornin, Usenet 2009. 
word inverse (word x) 



T[l], M [58] ,M [59] , K [58] , N [59] ) : 

T[0], M[48] ,M[49] ,N[48] ,N[49]) ; 
T[2], M[60] ,M[61] ,N[60] ,N[61]) ; 



T[0], M[56] ,M[57] ,N[56] ,N[57]) ; 
T[3], M[62] ,M[63] ,N[62] ,N[63]) ; 



word y 


= 2 - X ; 


// 


xy = = 


1 


mod 


4 


y *= 2 


- x*y; 


// 


xy = = 


1 


mod 


16 


y *= 2 


- x*y ; 


// 


xy = = 


1 


mod 


256 


y »= 2 


- x*y ; 


// 


xy = = 


1 


mod 


65536 


y *= 2 


- x*y ; 


// 


xy == 


1 


mod 


4294967296 


return 


y; 













} 

// 

void invertkey( word IM [64] , word iN [64] , 

word const M [64] , word const N [64] ) 

{ 

// M, N, iM , iN must not overlap! 

for(int k=0; k < 64; k++) 

{ 

iM[k] = inverse ( M[63-k] ); 
iN[k] = - N[63-k] * iM[k]; 

} 

} 

// 



void icrypt ( word X [4] , 
word const 



word const Y [4] , word const T [4] , 
iM [64] , word const IN [64] ) 



word Xrs [4] , Trs [4] ; 
Trs [0] = _rotl (T [3] , 16) 
_rotl (T [2] , 16) 
_rotl (T [1] , 16) 
_rotl (T [0] , 16) 
_rotl (Y [3] , 16) 
_rotl (Y [2] , 16) 
_rotl (Y [1] , 16) 
_rotl (Y [0] , 16) 
Xrs , Xrs , Trs , IM 



Trs [1] 
Trs [2] 
Trs [3] 
Xrs [0] 
Xrs [1] 
Xrs [2] 
Xrs [3] 
crypt ( 
X[0] = 
X[l] = 
X[2] = 
X[3] = 



IN ) ; 



.rotl (Xrs [3] 
.rotl (Xrs [2] 
.rotl (Xrs [1] 
.rotl (Xrs [0] 



16) 
16) 
16) 
16) 



} 

// 

// Test ing against the reference implement at ion 

void test () 

{ 

int const nTimes = 10000 ; 
int const nRep = 100 ; 

word X[4], Y[4], T [4] , Z [5] , M [64] , N [64] , iM [64] , iN [64] ; 
for(int i=0; i<5; i++) 

Z [i] = random_word ( ) ; 
for(int i=0; i<4; i++) 

T[i] = random_word() ; 
word U = random_word ( ) ; 

// correctness of tlie optimized implementation 
expandkey ( M , N , Z , U ) ; 
f or ( int n= nTimes ; n ; n- - ) 
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145 { 

146 fordnt 1=0; i<4; i+ + ) 

147 X[i] = random_uord () ; 

148 memcpyC Y, X, sizeof(X) ); 

149 forCint m=nRep; m; m--) 

150 < 

151 encrypt ( X, X, Z, T, U ); 

152 crypt ( Y, Y, T, M, N ); 

153 } 

154 if( memcmp (Y ,X , sizeof (X) ) ! =0 ) 

155 cout << "crypt: incorrect encryption!" << endl ; 

156 } 

157 // inver t ib i 1 i t y of tiie optimized implementation 

158 invertkey( iM , iN , M, K ); 

159 forCint n=nTimes; n; n--) 

160 { 

161 forCint i=0; i<4; i++) 

162 X[i] = random_word C) ; 

163 memcpy C Y, X, sizeof CX) ); 

164 forCint m=nRep; m; m--) 

165 crypt CY, Y, T, M, N); 

166 forCint m=nRep ; m; m--) 

167 icryptC Y, Y, T, iM , iN ); 

168 if( memcmpCY, X, sizeof(X)) ! =0 ) 

169 cout << "icrypt: incorrect decryption!" << endl; 

170 } 



171 } 

172 1 1 1 / 1 1 1 / 1 1 1 / 1 1 1 1 1 1 1 / 1 1 1 / 1 1 1 1 1 1 1 1 1 1 1 / 1 1 1 / 1 1 1 / 1 1 1 1 1 1 1 / 1 1 1 / 1 1 1 / 1 1 1 1 1 1 1 1 1 1 1 1 



